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1. Introduction 

Our goal in this paper is to give simple proofs of the Weak Law of Large Numbers and the Central Limit 
Theorem for independent and identically distributed random variables. Our proofs make only the minimal 
necessary assumptions: that the mean exists for the Weak Law of Large Numbers, and that the mean and 
variance exist for the Central Limit Theorem. We use only straightforward manipulations of probabilities, 
expectations and distribution functions. In particular, we make no use of characteristic functions, or of 
other transform techniques, nor does it use any operator-theoretic methods. We use the Stieltjes integral 
Ex[X] = J xdFx(x) (where Fx{x) = Pr[X < x] is the distribution function of X) as the definition of 
expectation. Our proof holds with complete generality if this integral is interpreted as a Lebesgue- Stieltjes 
integral. But everything we do will also be valid if one interprets this definition as the combination of a 
Riemann integral (for the continuous component of the distribution) and a sum (for the discrete component), 
ignoring the possibility of a singular component. The only property of the normal distribution that we use 
is that a sum of independent normally distributed random variables is again normally distributed, with the 
means and variances added, as is shown in Ross [3, pp. 256]. Finally, the only analysis that we use is the 
definition of a limit and Taylor's Theorem (with the Lagrange form of the remainder term). 

Our proof of the Central Limit Theorem is inspired by that of Trotter [2]; in principle, all we have done 
is to transform those parts of his proof involving operators and other notions from functional analysis into 
straightforward manipulations of probabilities, expectations and distribution functions. And Trotter himself 
says: "Our proof is in principle the same as that used by Lindeberg [1]." So our simple proof actually has a 
long lineage. The observation that the same technique also yields the Weak Law of Large Numbers appears 
to be new. 

It is in fact the idea of proving both the Central Limit Theorem and the Weak Law of Large Numbers 
with a single argument that yields the key to the proof. In proving any convergence result, it is always 
tempting to use metric, such as the supremum norm, since then one can use the triangle inequality and 
other tools from the theory of metric spaces. But the supremum norm governs uniform convergence, and 
while uniform convergence does indeed take place in the Central Limit Theorem, there is in general only 
pointwise convergence in the Weak Law of Large Numbers, and pointwise convergence is not governed by any 
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metric. This suggests the idea of smoothing the distribution functions of the random variables by convolving 
them with a smooth distribution function (that is, by adding an independent smoothly distributed random 
variable). In the case at hand, the random variables we choose to add are beta-distributed with various 
parameters. 

We prove the Central Limit Theorem in the following form. 

Theorem 1: Let Z, Z\, . . . , Z n be independent and identically distributed random variables with Ex[Z] = 

and Var[Z] = Ex[Z 2 ] = 1. Let U n = Z 1 H h Z n and let S n = Un/n 1 / 2 . Let N be normally distributed 

with Ex[N] = and Var[iV] = 1. Then lim„^ 00 F Sn (t) = F N (t) for all real t. 

The Central Limit Theorem is often stated in a more general form in which the mean and variance of Z 
are not assumed to be and 1, respectively. But this form actually reduces to the special case of Theorem 
1: if Ex[Z'] = fi and Var[Z'] = er 2 , applying Theorem 1 with Z = [Z 1 — \x)ja yields the conclusion that the 
limiting distribution of (Z[ + ■ ■ ■ + Z'^/n 1 / 2 is normal, with mean \i and variance a 2 . 

We shall prove the Weak Law of Large Numbers in the following form. 

Theorem 2: Let Z, Z\, . . . , Z n be independent and identically distributed random variables with Ex[Z] = 

and Ex[|Z|] = 1. Let U n = Z\-\ V Z n and let S n = U n /n. Let I? be a deterministic random variable with 

Pr[D = 0] = 1. Then limn^oo F Sn (t) = F D (t) for all real t at which F D (t) is continuous (that is, all t ^ 0). 

The Weak Law of Large Numbers is often stated in terms of "convergence in probability" (that is, 
Pr[|5 n /n| > e] — » as n — > for all e > 0), rather than in terms of "convergence in distribution", as we 
have done. But, although convergence in probability is in general stronger than convergence in distribution, 
in the case of convergence to a deterministic value they are equivalent, since 

Pr[|S„/n| > e] = Ei[S n /n > e] + Pr[S n /n < -e] 

<l-F Sn/n (e)+F Sn/n (-s), 
and convergence in distribution yields F Sn / n (e) — > 1 and F Sn / n (—e) — > as n — > oo for all e > 0. 

The Weak Law of Large Numbers is often stated in a more general form in which the means of Z and \Z\ 
are not assumed to be and 1, respectively. But this form actually reduces to the special case of Theorem 
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2: if Ex[Z'] = and Ex[|Z' — fj,\] = t, applying Theorem 2 with Z = (Z — fi)/r yields the conclusion that 
the limiting distribution of {Z[ + • • • + Z' n )/n is deterministic, with almost sure value /x. 

2. The Proofs 

Our key tactic, which will be used repeatedly in our proofs, is that if W and X are independent random 
variables, 

F w+X (w) = j F w {w-x) dFx(x) = Ex[F w (w - X)}. 

The strategy of our proof can be described in rough terms as follows. To prove that a sequence Fs n 
of distributions tends to a limiting distribution Ft, it is natural to try to use a metric in the space of 
distributions, such as 



Fs n — Ft 


= sup 




t 



F Sn {t)-F T {t) . 



But this metric describes uniform convergence of the distributions, which may not hold in the cases we 
consider. (For example, we will not usually even have convergence at t = in Theorem 2.) This leads us 
to employ the following device. We add to both S n and T an independent random variable W. By giving 
W a sufficiently smooth distribution function, we will be able to prove that Fw+s n converges uniformly to 
Fw+t, and by making W sufficiently small in absolute value, even pointwise convergence will imply the 
convergence of Fs n (t) to for all t at which Ft is continuous. 

Lemma 3: Let S n for n > 1 and T be random variables. If, for every 8 > 0, there exists a random variable 
W, independent of the S n and T, satisfying \W\ < 8 and such that Fs n +w(s) — > Ft+w( s ) asn^oo for all 
s, then Fs n (t) — > asn^oo for all t at which Ft is continuous. 

Proof: Let t be a point of continuity for Ft- Given e > 0, find 8 > such that + 28) < + s and 

Fx(t — 28) > — e. Let W satisfy the hypotheses of the lemma. Since W < 8, we have 

F Sn (t)=Pr[S n <t] 

< Pv[S n + W < t + 8} 

= F Sn+w (t + 8). 
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Since W > —6, we have 

F T +w(t + S) = Pr[T + W < t + S] 

= Pr[T < t + 5 - W] 

< Pt[T < t + 26} 

= F T (t + 2S). 
Since lim^oo Fs n +w{t + 5) = Fx+w{t + 5), we have 

F Sn (t) <F Sn+w {t + S) 

< F T+W (t + S) +e 

< F T (t + 2S)+e 

< F T (t)+2e 

for all sufficiently large n. A similar argument yields 

F Sn{ t) >F T (t)-2e 

for all sufficiently large n. Since these inequalities hold for all e > 0, we obtain lim^oo Fs n {t) = as 
desired. □ 

If S n is the sum of n independent and identically distributed contributions X\,... ,X n , and T n is the 
sum of n independent and identically distributed contributions Yi,...,Y n , then we can go from S n to T n in 
n steps by changing one at each step to Yi. Then n applications of the triangle inequality will allow us 
to bound \Fs n — Ft\ by n times \F\ — Fy\, where X and Y have the common distributions of and Yi, 
respectively. Adding the contribution of W in each case yields the following lemma. 

Lemma 4 : Let X, X\,...,X n be independent and identically distributed random variables, and let S n — 

X\ H h X n . Let Y, Yi, . . . , Y n be identically distributed random variables, independent of each other and 

the Xi, and let T n = Y\ H h Y n . Let W be a random variable independent of X, the JQ, Y, and the Yi. 

Then 



Fw+s n 



W+T„ 



< n 



W+X - rw+Y 
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Proof: For < i < n, let Q, t = Y x -\ h Y % and R, = X l H h X n . Then 

^V+s n {w) -F w +T n {w) 

= ^ F W+Q i - 1 +X i +R i +i(' w ) - F W+Q(- 1 +Y i +R i +i( l 
l<i<n 

= ^ Ex i F w+x,(w - Qi-i - Ri+i) - F w+Yi {w - <2i_i - Ri+i)]. 



Ki<n 



Taking absolute values, we obtain 



F w+S n (w)-F w+Tn (w) 



Ki<n 



Ki< 



^2 Ex[F w+Xi (w - - R l+1 ) - F w+Yi {w - Q 4 _i - R l+1 
Ex[Fw+Xi(w - Qi-i - Ri+i) - F w +Yi(w - Qi-i - R i+ \ 

Fw+X z ( w - Qi-i - Ri+i) - Fw+Yi(w - Qi-i - Ri+l) 



< E Ex 

l<i<n 



Each absolute value in the last right-hand side is bounded by \F\y+x — Fw+y\- Since there are n terms in 
the sum, we obtain 



F w +s n (w) - F w+Tn (w) 



< n 



Fw+x - F w+Y 



Since this inequality holds for all w, we obtain the conclusion of the lemma. □ 

In Lemma 3, we required W to be small compared with S n and T. But if S n and T n — T are each the 
sum of n identically distributed contributions Xi and Yi, as in Lemma 4, then these contributions will be 
small compared to W when n is large. Thus, if the distribution of W is smooth, it will change little when 
the small random variable Xi or Yi is added. The following lemma and its corollary express this fact in the 
form we need. 

Lemma 5: Let W and Z be independent random variables, with Fw having two bounded and uniformly 
continuous derivatives, and with Z satisfying Ex[Z] = and Ex[Z 2 ] = 1. Let X = Z/N 1 / 2 . Then, for every 
e > 0, 



F w+ x(w) - F w (w) - ^F^(w) 



< 



for all sufficiently large n and all w. 
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Proof: We have 



F w+X (w) =Ex[F w (w-X)]. 



Since F w has two continuous derivatives, we may expand it in a Taylor series, 



F w (w -X) = F w (w- Z/n 1 ' 2 ) 



- ^Tj2 F wH Z + —F w {v{Z))Z 2 
F w(w) - ^/2 F w( w ) Z + + ^( F w(v(Z)) F w (w))Z 2 , 



where v is a function satisfying w — Z/n 1 / 2 < v(Z) < w, with F w (v(Z))Z 2 integrable because all the other 
terms in the equation defining it are integrable. Taking expectations on both sides, and using Ex[Z] = 
and Ex[Z 2 ] = 1, we have 

F w+X (w) = F w (w) + ^F^ib) + ±-Ex[{F w (v(Z)) - F w (w))Z 2 ]. 
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Thus, given e > 0, it will suffice to show that the last term on the right-hand side has absolute value at most 
2e for all sufficiently large n. Since Fw has a uniformly continuous second derivative, there exists S > such 
that \v — w\ < S implies \F w (v) — F w (w)\ < e for all v and w. And since the second derivative is bounded, 
there exists M such that < M for all v. We break the expectation to be bounded into two parts, 

Ex[(F&(v{Z)) - F w (w))Z 2 } = Ex[(F w {v{Z)) - F w (w))Z 2 , \z\ < Sn 1 / 2 } 

+ Ex[(F w (v(Z)) - F w (w))Z 2 , \Z\ > Sn 1 ' 2 ], 
where Ex[g(Z), E] = J E g(Z) dFz(x) denotes the expectation of g{Z) restricted to the event E. Since 
\Z\< Sn 1 / 2 implies \v{Z) — w\ < 5, which in turn implies \F w {v{Z)) — F w (w)\ < e, we have 

Ex[(F w {v{Z)) - F w (w))Z 2 , \Z\ < Sn 1 ' 2 } < eEx[Z 2 , \Z\ < Sn 1 ' 2 ] 

< eEx[Z 2 ] 

< e. 

And since \F w (v(Z)) - F w (w)\ < 2M and Ex[Z 2 , \Z\ > y] -)• as y -> oo, we have 



Ex[(i^(«(Z)) - F w (w))Z 2 , \Z\ > Sn 1 ' 2 ] 



< 2MEx[Z 2 , \Z\ > Sn 1 ' 2 ] 

< e 



for all sufficiently large n. □ 

Corollary 6: Let W , Z and Z' be independent random variables, with i*V having two bounded and uniformly 
continuous derivatives, and with Z and Z' satisfying Ex[Z] = Ex[Z'] = and Ex[Z 2 ] = Ex[Z' 2 ] = 1. Let 
X = Z/n 1 ' 2 and X' = Z'/n 1 ' 2 . Then, for every e > 0, 



7 w+x — F w+X ' 



e 

< -. 

n 



Proof: Apply Lemma 5 with e/2 for e, and Z and Z' in turn for Z, and use the triangle inequality for 
absolute value. □ 

Proof of Theorem 1: Since Fn is continuous, we may use Lemma 3 to show convergence at all points. Given 
5 > 0, we take W to be the median of five independent random variables, each uniformly distributed in the 
interval [—(5,5] (equivalently, W = S(2B — f), where B ~ Beta(3, 3)). This random variable clearly meets 
the conditions of Lemma 3 and, since its first two derivative vanish at ±5 and it varies only over a closed 
and bounded interval, it also meets the conditions of Lemma 5 and Corollary 6. Taking Z' to be the normal 
random variable N and Y = N/n 1 / 2 in Corollary 6, we conclude that, for every e > 0, 



W+X — t<W+Y 



(w) 



e 

< -. 

n 



Lemma 4 then allows us to conclude that, for every e > 0, 



Fw+s n - Fw+t„ 



where S n = X\ -\ h X n (with the X% independently distributed like X) and T n = Y\-\ h Y n (with the 

Yi independently distributed like Y) Since T n is the sum of n independent and normally distributed random 
variables, each having variance 1/n, it has the distribution of the standard normal random variable N. Thus 
we obtain 



F\v+s n — Fw+n 



< e. 



We can now apply Lemma 3 with N for T, and conclude that Fs n (t) converges to Fjf(t) for all t. □ 
Our proof of Theorem 2 is even simpler than our proof of Theorem f . 
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Lemma 7: Let W and Z be independent random variables, with Fw having a bounded and uniformly 
continuous derivative, and with Z satisfying Ex[Z] = and Ex[|Z|] < 1. Let X = Z/n. Then, for every 
e > 0, 



F w +x(w) - F w (w) 



< 



for all sufficiently large n and all w. 
Proof: We have 



F w+X (w) =Ex[F w (w-X)]. 



Since Fw has a continuous derivative, we may expand it in a Taylor series, 

F w (w - X) = F w (w - Z/n) 

= F w (w) - -F w (v(Z))Z 
n 

= F w {w) + -F w (w)Z + -(F{ v (v(Z)) - F w {w))Z, 
n n 

where v is a function satisfying w — Z/n < v(Z) < w, with F w (v(Zj)Z intcgrable because all the other 
terms in the equation defining it are integrable. Taking expectations on both sides, and using Ex[Z] = 0, we 
have 

F w+X {w) = F w {w) + -Ex[(F w (v(Z)) - F w (w))Z}. 
n 

Thus, given e > 0, it will suffice to show that the last term on the right-hand side has absolute value at most 
2s for all sufficiently large n. Since Fw has a uniformly continuous derivative, there exists 5 > such that 
\v — w\ < 5 implies \F w (v) — F w (w)\ < e for all v and w. And since the derivative is bounded, there exists 
M such that < M for all v. We break the expectation to be bounded into two parts, 

Ex[(F w (v(Z)) - F w (w))Z] = E X [(F w (v(Z)) F w (w))Z, \z\ < Sn] 

+ Ex[(F w (v(Z)) - F w (w))Z,\Z\ > Sn]. 
Since \Z\ < Sn implies \v(Z) — tu| < S, which in turn implies \F w (v(Zj) — F w (w)\ < e, we have 



Ex[(F w (v(Z)) - F w (w))Z, \Z\ < Sn] 



< eEx[|z|, \Z\ < Sn] 

< eEx[|Z|] 

< £. 



And since \F{ v (v(Z)) - F w {w)\ < 2M and Ex[|Z|, \Z\ > y] -> as y -> oo, we have 
Ex[(i^,(v(Z)) - |Z| > 5n] < 2MEx[\Z\, \Z\ > Sn] 

< £ 

for all sufficiently large n. □ 

Corollary 8: Let W, Z and Z' be independent random variables, with Fw having a bounded and uniformly 
continuous derivative, and with Z and Z' satisfying Ex[Z] = Ex[Z'] = and Ex[|z|], i£x[|Z'|] < f. Let 
X = Z/n and X' = Z'/n. Then, for every e > 0, 



F 



w+x - Fw+x 



< 



Proof: Apply Lemma 7 with e/2 for e, and Z and Z' in turn for Z, and use the triangle inequality for 
absolute value. □ 

Proof of Theorem 2: Since Fd is continuous at all points other than 0, we may use Lemma 3 to show 
convergence at all points other than 0. Given 5 > 0, we take W to be the median of three independent 
random variables, each uniformly distributed in the interval [— 5, 5} (cquivalently, W = S(2B — 1), where 
B <~ Beta(2, 2)). This random variable clearly meets the conditions of Lemma 3 and, since its first derivative 
vanishes at ±6 and it varies only over a closed and bounded interval, it also meets the conditions of Lemma 
7 and Corollary 8. Taking Z' to be the deterministic random variable D in Corollary 8, and Y = D/n + D, 
we conclude that, for every e > 0, 

Fw+x - F w+Y (w) 



n 



Lemma 4 then allows us to conclude that, for every e > 0, 



Fw+s n 



W+T„ 



< e. 



Since T n is the sum of n deterministic random variables, each having mean 0, it has the distribution of the 
deterministic random variable D. Thus we obtain 



F w+s„ 



*W+D 



< e. 
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We can now apply Lemma 3 with D for T, and conclude that Fs n (t) converges to Fd {t) for all t at which 
Fd is continuous (that is, alH ^ 0). □ 
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